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protein which requires adoption of a specific conforma- 
tion. Folding into it becomes increasingly difficult when 
N becomes larger and larger. The sizes of proteins are 
substantially smaller than those of the DNA molecules 
whose coding function does not depend on the shape. 
The native structure of a protein, on the other hand, is 
believed to be a decisive factor in its folding mechanism 
(Baker, 2000; Takada, 1999). 

A simple parameter that is used to characterize the struc- 
ture of the protein is the relative contact order, CO, 
(Plaxco et al., 1998) defined as average sequence dis- 
tance between two aminoacids that interact with each 
other, i.e. form a contact, in the native state: 
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where is if the amino acids i and j do not form 
a contact and 1 otherwise. The relative contact order 
parameter is small for a- proteins in which all secondary 
structures consist of the a-helices because the hydrogen 
bonds in the helices correspond to |? — j| — 4. On the 
other hand, /?-proteins tend to have larger CO because 
the (3 strands that form a sheet often involve amino acids 
which are quite distant along a sequence. 
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Molecular dynamics simulations in simplified models allow 
one to study the scaling properties of folding times for many 
proteins together under a controlled setting. We consider 
three variants of the Go models with different contact po- 
tentials and demonstrate scaling described by power laws and 
no correlation with the relative contact order parameter. We 
demonstrate existence of at least three kinetic universality 
classes which are correlated with the types of structure: the 
a-, a-/3-, and /3- proteins have the scaling exponents of about 
1.7, 2.5, and 3.2 respectively. The three classes merge into 
one when the contact range is truncated at a 'reasonable' 
value. We elucidate the role of the potential associated with 
the chirality of a protein. 

INTRODUCTION 

How do size and structure of a protein affect its folding 
kinetics is an interesting basic issue that has been de- 
bated in recent years. The size can be characterized by 
the number, N, of the amino acids that the protein is 
made of. The distribution of N across proteins stored 
in the data banks is peaked around N—lOO (Cieplak & 
Hoang, 2000) and all proteins with a large N, like titin 
{N « 30 000), consist of many domains. There must be 
then a mechanism that prevents globular proteins from 
reaching much larger sizes. We have argued (Cieplak & 
Hoang, 2000) that this is provided by the function of the 



In their seminal 1998 paper (Plaxco et al., 1998) (paper 
I), Plaxco, Baker, and Simons have argued that folding 
rates correlate with CO but do not with N. Their argu- 
ment was based on analysing experimental data on short 
proteins that were available in the literature. Their con- 
clusion was reinforced in the 2000 paper (Plaxco et al., 
2000) (paper II) by Plaxco, Simons, Ruczinski, and Baker 
in which the compilation of the kinetic data involved a 
larger set of proteins, including those that were consid- 
ered in paper I. The later data were also restricted to a 
much narrower temperature range of between 20 and 25 
°C. Their results for the folding times (i.e. the inverses of 
the folding rates) are represented in Figure 1 as a func- 
tion of N (on the logarithmic scale). For the purpose of 
further discussion, we have divided the data into three 
classes: a-proteins, /3-proteins, and Q;-/3-proteins. The 
a-proteins are easily seen to be the fastest folders but 
clearly all of the data points are scattered all over the 
plane of the figure. 

This random looking pattern of the data may, however, 
be only apparent since the plot might involve mixing dis- 
tinct classes of proteins that perhaps should not be com- 
pared together. Figure 2 indeed hints at such a possibil- 
ity as the splitting into the a-, (3-, and a-(3- structural 
classes reveals some patterns. These patterns are shown 
in different time windows - the a-proteins are in the win- 
dow of much shorter times. There is a growing trend for 
the [3- proteins and, if one disregards one outlayer, also 



for the Q:-/3-proteins. The data for the a-proteins, how- 
ever, arc puzzling since if they do show an overall trend 
then it would be downwards, i.e. the bigger the N, the 
shorter the folding time which defies a common simple 
expectation to observe the opposite. 

The combined data show a strong correlation with the 

CO parameter. When the data arc split into the three 
structural classes, as shown on the right-hand side of Fig- 
ure 2, then the correlation remains strong for the a- and 
Q!-/3-protcins. However, in the crucial test case of the 
/3-proteins (the right-bottom panel of Figure 2) four pro- 
teins have nearly the same CO and yet substantially dif- 
ferent folding times. Thus there are some unsettling is- 
sues in our understanding of the experimental data that 
would be desirable to solve. 

Theoretical modeling in simplified models, despite its 
well known general shortcomings, is expected to be a tool 
of help to identify possible trends because a unified ap- 
proach can be applied to many different proteins. In this 
paper, we consider 51 proteins: 21 of the a-0 kind with 
N between 29 and 162, 14 of the a-proteins with N be- 
tween 35 and 154, and 16 /3-proteins with N between 36 
and 124. This set contains the 21 proteins, used in Fig- 
ures 1 and 2, that were considered by Plaxco et al. All of 
the 51 proteins are modelled in three different ways and 
studied by the techniques of molecular dynamics. Even 
though the three models are all coarse grained and of the 
Go type (Abe & Go, 1981; Takada, 1999) they have very 
different kinetic and equilibrium properties when used for 
a particular protein. The variations between the models 
do lead to some differences in scaling properties of cer- 
tain parameters, such as the temperature of the fastest 
folding, Tmim or the thermodynamic stability tempera- 
ture, Tf, but they all agree on a power law dependence 
of the folding time, tfoid, on N 

tfoid ~ (2) 

and on the lack of any correlation of tfoid with CO. The 
problems with the experimental results on the N depen- 
dence may be related to the lack of the temperature op- 
timization. The folding time often depends on the tem- 
perature, T, and making choices on the temperature to 
study kinetics may affect the outcome of the measure- 
ment. We argue that a demonstrable trend might arise 
when all data are collected at Tmin which needs to be 
determined for each protein individually. 

The theoretically derived lack of correlation of tfoid with 
CO seems to be a more difficult issue. One may just dis- 
miss it as characterizing not real life but an approximate 
model. On the other hand, the essence of the Go models 

is that they are based on the native topology. Thus if 
such geometry sensitive models do not 'care' about the 



contact order then what models would? We leave it as 
an open question and this paper may be just considered 
to be a report on what are the properties of three dif- 
ferent Go-like models. Notice, however, that once a Go 
model is constructed its contacts are well defined and the 
kinetics are studied in the context of such a definition 
whereas assignment of contacts in experimental systems 
is subjective. It should be pointed out that the contact 
order in the Go models is actually quite important but 
not for the overall folding time - it is the primary fac- 
tor that governs the succession of events during folding 
(linger, 1996; Hoang & Cieplak, 2000a; Hoang & Cieplak, 
2000b; Cieplak et al., 2002a; Cieplak et al., 2002b; Er- 
man, 2001). In other words, what is important for folding 
of a protein is the full " spectrum" of the relevant values 
of the sequence distances and not just their average 
value. A similar point has been argued within a host of 
models in references (Galzitskaya and Finkelstein, 1999; 
Aim and Baker, 1999; Munoz and Eaton, 1999; Du et al., 
1999; and Plotkin and Onuchic, 2000). 

The power law dependence described by eq. 2 has been 
proposed by Thirumalai (Thirumalai, 1995) and then 
demonstrated explicitly for several types of lattice mod- 
els (Gutin et al., 1996; Zhdanov, 1998; Cieplak et al., 
1999). On the other hand, a number of theories and a 
recent simulation of 18 proteins (away from the optimal 
folding condition) by Koga and Takada (Koga & Takada, 
2001) suggest a power law dependence for barrier heights 
on N and hence an exponential dependence of tfoid on 
N (Takada & Wolynes, 1997; Finkelstein & Badredtinov, 
1997; Wolynes, 1997). Thus the issue of scaling remains 
unsettled not only experimentally but also theoretically. 
Recently (Cieplak & Hoang, 2001) we have demonstrated 
the power law dependence for one variant (of the three 
studied here) Go- like models when applied to 21 pro- 
teins which were mostly of the a-(3 kind. The resulting 
exponent A turned out to be equal to 2.5 ± 0.2. In this 
particular variant of the Go model, the native contact 
interactions were restricted to a cut-off value of 7.5A and 
the contact potential was described by the Lennard- Jones 
form. 

Here, we extend such studies to the other two kinds of 
proteins, a and /3, and arrive at a similar value of A. How- 
ever, when the model is made significantly more realistic 
by considering the range of the native contact interac- 
tions as a variable quantity, then we arrive at a richer 
picture. We show that the three classes of tertiary struc- 
tures also correspond to three different kinetic universal- 
ity classes. The a-proteins come with A of around 1.7 
(the result obtained previously (Cieplak & Hoang, 2001) 
for decoy helical structures), the /3-proteins are character- 
ized by A close to 3.2, and the a-/3-proteins have A near 
2.5. These values do not depend on whether the contact 
potential are Lennard- Jones or of the 10-12 form so they 



are truly a reflection of the native topology. The power 

law trends arc pretty evident when the folding times arc 
determined at Tmin but harder to see otherwise. In these 
studies, the range of the contact interactions has been de- 
termined based on the van dcr Waals radii of the atoms 
(Tsai et al., 1999). Another realistic item that we imple- 
ment is the chirality potential - a term which is respon- 
sible for folding to a conformation of the correct native 
chirality. This term affects the kinetics but we show it 
not to affect values of the exponent A. 

The growth of t fold with N indicates increasingly deteri- 
orating folding conditions. Our studies of scaling of Tmin 
and Tf indicate that asymptotically Tj becomes siibstan- 
tially lower than Tmin which signifies an onset of slow 
glassy kinetics before the system is near the native con- 
formation. This adds to the deterioration of foldability 
and suggests the limitation in the observed values of A^. 
The three models considered here have Tmin and Tf vary- 
ing as a function of A'^ in different ways, though they agree 
asymptotically. Among the three models, the Lennard- 
Jones contact potential with the variable Rc appears to 
have the most appealing kinetic properties in that it leads 
to a very good foldability for a small A^. This should be 
our simple model of choice in future studies. However, 
the issue of the scaling trends needs now to be studied in 
models that reach beyond the Go approximation and in 
experiments with a protocol that involves optimization. 

MATERIALS AND METHODS 

A. The Hamiltonian 

An input for the construction of the Go model is a 

PDB file (Bernstein et al., 1977) with the coordinates 
of all atoms in the native conformation. The coordi- 
nates are used to determine the length related param- 
eters of the model. Whereas all energy and temper- 
ature related parameters are expressed in terms of a 
common unit - e. We model 51 proteins. In addi- 
tion to the proteins listed in the caption of Figure 1, 
we also consider lcti(29), lcmr(31), lerc(40), lcrn(46), 
7rxn(52), 5pti(58), ltap(60), laho(64), lptx(64), 
lcrg(70), 1021(162) which arc of the a-f3 type, or un- 
structured, then lce4(35), lbba(36), lbw6(56), lrpo(61), 
lhp8(68), lail(73), lycc(103) which are of the a type, 
and lcbh(36), lixa(39), lcd7(45), lbq9(53), 2cdx(60), 
2ait(74), lbdo(80), lwit(93), lwho(94), 6pcy(99), 
Iksr(lOO), 4fgf(124) which are of the /? type. The symbols 
are the PDB codes and the numbers in brackets indicate 
the corresponding value of A^. The choice of these pro- 
teins was motivated by their size but otherwise random. 

We consider several variants of the Go models. In each 
case, the Hamiltonian consists of the kinetic energy and 
of the potential energy, Ep{{ri}), which is given by 
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The first term, V is the harmonic potential 
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which tethers consecutive beads at the equilibrium bond 
length, do, of 3.8A. Here, r^^i+i = jr^ — r^+ij is the dis- 
tance between the conscicutive beads and k = lOOe/A^, 
where e is the characteristic energy parameter corre- 
sponding to a native contact. 

The native contacts are defined either through the dis- 
tances between the C" atoms or through an all-tom con- 
sideration. The first choice, used by us previously (Hoang 
& Cieplak, 2000a; Hoang & Cieplak, 2000b; Cieplak & 
Hoang, 2001), is to take a uniform cut-off distance, Rc, 
of 7.5A, below which a contact is said to be present. In 
the second choice, used here in most cases, all the heavy 
atoms present in the PDB file are taken into account. 
Specifically, a pair of aminoacids is considered to form a 
contact if any pair of their non-hydrogen atoms have a 
native separation which is smaller than 1.244 (i?, -I- Rj), 
where Ri are the van der Waals radii of atom i, as listed 
in ref. (Tsai et al., 1999). This critical separation cor- 
responds to the point of inflection of the Lennard-Jones 
potential. Figure 3 shows the distribution of the effec- 
tive contact ranges as obtained for an A^=162 protein 
T4 lysozyme with the PDB code 1021 which consists of 
10 a-heliccs and 3 /3-strands. There are 339 native con- 
tacts in this case and they range in value between 4.36 
and 12.80 A. It is clear that truncating this distribution 
at whatever "reasonable" value, which is often taken to 
be in the range between 6.5 and 8.5 A would result in 
a substantial removal of the relevant interactions. Thus 
insisting on a uniform cutoff value is expected to have 
noticeable dynamical effect. 

We consider two variants of the interactions in the native 
contacts. The first variant is the 6-12 Lennard-Jones 
potential 
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where the sum is taken over all native contacts. The pa- 
rameters cTjj are chosen so that each contact in the native 
structure is stabilized at the minimum of the potential, 
and cr = SA is a typical value. The second variant is the 
10-12 potential 
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where r|"^ coincides with the native distance. This po- 
tential is frequently used to describe hydrogen bonds 
(Clementi et al., 2000). For each pair of interacting 
amino acids, the two potentials have a minimum energy 
of — e and are cut off at 20 A. The non- native interac- 
tions, V^'-'^ , are purely repulsive and are necessary to 
reduce the effects of entanglements. They are taken as 
the repulsive part of the Lennard- Jones potential that 
corresponds to the minimum occurring at 5A. This po- 
tential is truncated at the minimum and shifted upward 
so that it reaches zero energy at the point of truncation. 

The final term in the Hamiltonian takes into account the 

chirality. Natural proteins have right handed helices but 
a Go model as described above involves chiral frustration: 
one end of a helix may want to fold into a right handed 
helix and another into a left handed one and " convincing" 
one end to agree with the twist of the other takes time 
and delays folding. Such a frustration would not arise 
naturally. In order to prevent it, we add a term which 
favors the native sense of the overall chirality at each 
location along the backbone. A chirality of residue i is 
defined as 



Ci = 
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(7) 



where Vj = rj+i — rj. A positive Cj corresponds to right- 
handed chirality. Otherwise the chirality is left-handed. 
The values of d are essentially between —1 and +1. The 
distribution of Ci in 21 Q;-/3-proteins considered in this 
study is shown in Figure 4. It is seen to be bimodal. The 
values in the higher peak correspond to locations within 
the helical secondary structures. The chiral part of the 
Hamiltonian is then given phenomenologically by 
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where Q is the step function (1 for positive arguments 
and zero otherwise), (7|^^^ is the chirality of residue i in 
the native conformation, and n is taken, in most cases, 
to be equal to e. However, a criterion for selection of its 
proper value remains to be elucidated. The idea behind 
this particular form of v'-^^^^ is that when the local chi- 
rality agrees with the native chirality then there is no 
effect on the energy. On the other hand, a disagreement 
in the chirality is is punished by a cost which is quadratic 
in chirality. 

yCHiR jj^g^g -(-j^g strongest effect on the helical structures. 
However, it affects the sense of a twist of the whole ter- 
tiary structure. The chirality term enhances the dynam- 
ical bias towards the native structure during the folding 
process and helps avoiding non-physical conformations 
such as left-handed helices. V'~^^^^ is a four-body poten- 
tial. In this respect this term is similar to potentials that 



involve dihedral angles (Veitshans et al., 1997; Clementi 
et al., 2000; Settanni et al., 2002). The dihedral terms 
enhance stability of a model of the protein but usually 
have no bearing on the chirality (Veitshans et al., 1997) 
unless they involve directly the values of native dihedral 
angles (Clementi et al., 2000; Settanni et al., 2002). 

B. The time evolution 

The time evolution of unfolded conformations to the na- 
tive state is simulated through the methods of molecular 
dynamics as described in details in (Hoang & Cieplak, 
2000a; Hoang & Cieplak, 2000b) (see also (Cieplak et 
al., 2002a: Cieplak et al., 2002b) in the context of the 
Lennard-Jones contact potentials. The beads represent- 
ing the amino acids are coupled to Langevin noise and 
damping terms to mimic the effect of the surrounding 
solvent and provide thermostating at a temperature T. 
The equations of motion for each bead are 



mr : 



-7r + F, + T 
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where m is the mass of the amino acids represented by 
each bead. A similar approach in the context of proteins 
has also been adopted in references (Guo and Thiru- 
malai, 1996; Berriz et al., 1997; and Eastman and Do- 
niach 1998). The specificity of masses has turned out to 
be irrelevant for kinetics (Cieplak et al., 2002a) and it is 
sufficient to consider masses that are uniform and equal 
to the average amino acidic mass. is the net force due 
to the molecular potentials and external forces, 7 is the 
damping constant, and F is a Gaussian noise term with 
dispersion sJI'^ksT. For both kinds of the contact po- 
tentials, time is measured in units of r = sjma'^ je^ where 
a is hA. This corresponds to the characteristic period of 
undamped oscillations at the bottom of a typical 6-12 po- 
tential. For the average amino acidic mass and e of order 
4kcal/mol, t is of order 3j3s. According to Veitshans et 
al. (Veitshans et al., 1997), realistic estimates of damp- 
ing by the solution correspond to a value of 7 near 50 
m/r. However, the folding times have been found to de- 
pend on 7 in a simple linear fashion for 7 > m/r (Hoang 
& Cieplak, 2000a; Hoang & Cieplak, 2000b; KHmov & 
Thirumalai, 1997). Thus in order to accelerate the simu- 
lations, we work with 7 = 2m/r but more realistic time 
scales are obtained when the folding times are multiplied 
by 25. The equations of motion are solved by means of 
the fifth order Gear predictor-corrector algorithm (Gear, 
1971) with a time step of 0.005r. 

The magnitude of the viscous effects, as controlled by 

the parameter 7, has to be sufficiently large so that the 
scenarios of the folding events are not dominated by the 
inertial effects. Otherwise the scenarios would depend on 
the spacial and not on the sequencial separation between 
the amino acids. Figure 5, for crambin as an illustration, 



shows that even though our value of 7 of 2 is reduced 

compared to the vahics that arc expected to be reaHs- 
tic it already corresponds to sufficiently strong damping 
with the minimal inertial effects. Figure 5 gives average 
first times needed to establish contacts separated by the 
sequence length |i — j| for three values of 7: 2, 12, and 
24 m/r. To the leading order, the times to establish the 
contacts (and also the folding times) are linear functions 
of 7 so one can show them together by proper rescaling. 
Furthermore, the whole pattern of the events is insensi- 
tive to the value of 7. Starting with this figure, we adopt 
the convention that the symbol sizes give measures of the 
error bars in the quantity that is plotted. 

The folding time is calculated as the median first pas- 
sage time, i.e. the time needed to arrive in the native 
conformation from an unfolded conformation. It is esti- 
mated based on between 101 and 201 trajectories. Tmin 
is defined as a temperature at which tfoid has a mini- 
mum value when plotted vs. T. For small values N, the 
U-shaped dependence of t fold on N may be very broad 
and then T„j„ is defined as the position of the center of 
the U-shaped curve. The simplified criterion for an ar- 
rival in the native conformation to be declared is based 
on a simplified approach in which a protein is considered 
folded if all beads that form a native contact are within 

(n) 

the cutoff distance of l.baij or 1.2r-j- for the 6-12 and 
10-12 potentials respectively. 

The stability temperature Tf is determined through the 
nearly equilibrium calculation of the probability that the 

protein has all of its native contacts established. Tj is 
the temperature at which this probability crosses ^ . The 
calculation is based on least 5 long trajectories that start 
in the native state in order to make sure that the sys- 
tem is in the right region of the conformation space. It 
should be noted that, in the literature, the frequently 
used estimate of the folding temperature is determined 
through the position of the maximum in the specific heat. 
This yields a which is typically larger than Tf. Our 
probabilistic interpretation has the disadvantage of being 
dependent on the precise definition of what constitutes 
the native basin (and thus only the approximate loca- 
tion of Tf is of relevance) but it has the advantage of 
relating only to the native basin and not to any other 
valleys in the phase space. In most of our systems, Tf 
is found to be comparable to Tmin, while both of them 
are always lower than Tj. Furthermore, in most cases, 
even though when Tmin is found to be higher than Tf, 
the folding times at Tf are comparable to those at Tmin 
which indicates that the model is unfrustrated in the con- 
ventional sense. Only in some very few cases, the folding 
times at Tf are excessively long to be determined in our 
simulation. This behaviour probably corresponds to a 
structural frustration (Clementi et al., 2000) embedded 
in the native conformation. 



An alternative to the contact-based criterion for folding is 
to provide a more precise delineation of the native basin 
as in ref. (Hoang & Cieplak, 2002b) or relate the crite- 
rion to a cutoff in the value of the RMSD distance away 
from the native conformation. These approaches are il- 
lustrated in Figure 6 which shows the dependence of the 
folding time, tfoid, vs. T for a synthetic a-helix (II16 of 
reference (Hoang & Cieplak, 2002a)) and /?-hairpin (B16 
of the same reference) that both consist of 16 monomers. 
Whichever criterion for folding is used, the folding curves 
are U-shaped and the non-zero chirality term extends 
the region of the fastest folding both towards the low 
and high temperature ends. For the hairpin, the effect is 
smaller but still clearly present. 

When it comes to model proteins, wc used only the 
contact-based folding criterion. An illustration of the 
role of the chirality potential is provided in Figure 7 for 
crambin (iV=46, the PDB code Icrn) which is a protein 
of the a-fi type. The top panel, for = 7.5A, shows 
that the shortest time of folding is somewhat reduced by 
yCHiR. ^Yic biggest impact is on the range of temper- 
atures at which folding is optimal, almost by the factor of 
2, especially in the low T regime. For the /3 proteins, the 
effect of the chirality potential is generally smaller. For 
the SII3 domain coded lefn the change due to yC"^^^ is 
hard to detect (not shown) but for the 127 globular do- 
main of titin, coded Itit, it is quite substantial on the low 
T side of the curve (Figure 8). We conclude that incor- 
poration of the chirality term in the Hamiltonian appears 
to reduce structural frustration in these models and thus 
makes the models more realistic. For all of the results 
presented here from now on (except for Figure 12), the 
chirality term is included. 

Another simple way to enhance the realism of the Go 

models is suggested by Figure 3: calculate the range of 
the contact potential instead of taking one uniform cutoff 
value. When we compare the case of the Lennard-Jones 
contact potential with the uniform or variable Rc then 
the nature of the effect on the kinetics strongly depends 
on the protein. For instance, for the protein Icrn (Figure 
7, bottom panel) there is essentially no difference. On 
the other hand, a dramatic narrowing of the U-curve is 
observed for Itit (Figure 8). 

On switching the 6-12 potential to the 10-12 potential 
all of the kinetic U-curves become substantially narrower 
(Figures 7 and 8). This is related to the fact that the 
potential well corresponding to the 10-12 potential is 
narrower which makes folding a task that requires more 
precision. Note, that the two potentials have the same 
energy (— e) at the minimum so the temperature scale are 
comparable. 



We have demonstrated that there are many ways to con- 
struct variants of the Go models and they all come with 
distinctive folding characteristics. 

RESULTS 

A. The 6-12 potential with the variable contact range 

Figure 9 shows the median values of tfoid at Tmin for 
the Lennard-Jones contact potential when the presence 
of the native contact is determined through the van der 
Waals sizes of the atoms (and with the ehirality tcnii in- 
cluded). Figure 9 data divides the data into the three 
structural classes. There are a few outlayers (one is the 
laps protein which appears to be a poor folder also ex- 
perimentally) but basically there are clear linear trends 
on the log-log scale which indicates validity of the power 
law, cq. (2). The values of the exponents 1.7 for the 
a-proteins and 3.2 for the /3-proteins agree with those 
found for decoy structures (Cieplak & Hoang, 2001). The 
decoy structures were constructed from homopolymers 
and the contact range was not variable due to the lack 
of atomic features in the decoys. Figure 10 replots the 
same data together to indicate that the trends identified 
in the classes are identifiably distinct. Thus the struc- 
tural classes also correspond to the kinetic universality 
classes. 

Figure 11 shows data equivalent to those on Figure 9 but 
now the folding times are determined at Ty, as an exam- 
ple of a situation that may be encountered away from the 
optimal conditions. The data points show a much larger 
scatter away form the trend identified at Tmin- The op- 
timal trend seems still dominant but it is so much harder 
to see. This should be analogous to results obtained ex- 
perimentally. 

It is interesting to figure out what is the effect of the ehi- 
rality potential on the scaling results. Figure 12 refers 
to the a-proteins and it compares the case of k = to 
K = e. Proteins with small values of N are not sensitive 
to the value of k but for >w 50 taking the ehiral- 
ity into account accelerates the kinetics quite noticeably. 
The 'asymptotic' scaling behavior remains unchanged - 
the exponent A of 1.7 is valid for both eases, though a 
somewhat larger value for k=0 cannot be ruled out (but 
certainly not as large as 2.5). We have checked that 
the data points for k = 2e, though corresponding to a 
bit faster times than for k = e, are in practice indistin- 
guishable from the latter in the scale of the figure. This 
observation suggests a behavior which saturates with a 
growing k. 

As pointed out in Ref. (Cieplak et al., 1999), the depen- 
dence of Tf and Tmin on N may offer additional clues 



about the foldability at large N. Figure 13 suggests that 

the a- and Q!-/3-proteins are excellent folders for small 
values of A^ since then Tmin is less than Tf. Tf ap- 
pears to have no systematic trend with N but the data 
for Train suggcst a weak growth, approximately propor- 
tional to log{N). Around A^ of 50 the trend associated 
with Tmin crosses the average value of Tf and from now 
on Tf is lower than Tmin- This suggests that asymptot- 
ically the energy landscape of the system would be too 
glassy-like to sustain viable folding. Thus accomplishing 
folding would require breaking into independently folding 
domains domain or receiving an external assistance, e.g. 
from chaperons whereas our studies are concerned with 
individual proteins. Figure 13 also suggests that the /3 
proteins behave somewhat differently since they exhibit 
no trend in Tmin the range studied and already for 
small values of A'^ Tmin exceeds Tf. Nevertheless the dif- 
ferences between the three structural classes are minor 
because they all show a border line behavior: the pro- 
teins in the range up to A'^=162 are not excellent but 
just adequate folders, at least in this model. 

It is interesting to point out that neither f, foui nor the 
characteristic temperatures indicate any demonstrable 
correlation with the relative contact order defined in eq. 
1. This is shown in Figure 14: for a given value of CO 
we find systems both with long and short folding times 
or both high and low values of Tmin- 

B. The 10-12 potential with the variable contact range 

We now check the stability of our results against the 
change in the form of the contact potential with the same 
characteristic energy scale. Figure 15 shows that when 
the Lennard-Jones potential is replaced by the 10-12 po- 
tential, with keeping all other Hamiltonian parameters 
intact, the scaling trends for Tfoid are consistent with 
those displayed in Figure 9 and confirm the existence of 
the three universality classes. 

Figure 16 suggests that the 10-12 systems are also bor- 
der line in terms of the positioning of Tmin vs Tf but 
the weak growing trends for the a- and a-/3-proteins are 
gone. The lack of correlations with the relative contact 
order also holds for the 10-12 potential (not shown). 

C. The 6-12 potential with Rc = 7.5A 

We now return to the Lennard-Jones potential and make 
the drastic, as evidenced by Figure 3, change that only 
those native contacts are considered whose range docs 
not exceed 7.5A. The resulting data are shown in Figure 
17. The top panel indicates that A of about 2.5 is still 
consistent with the trend obtained. However, A of 1.7 is 
quite off the mark for the a-proteins. The exponent of 3.2 
for the /3-proteins is not ruled out but the scatter in the 



data points is bigger than in the bottom panel of Figure 

9. Taken together with the resuhs for the a-protcins, the 
most hkely conclusion is that the fixed, and invasive, cut 
off in the contact range looses the ability to distinguish 
between the structural classes and all such models of the 
proteins would be characterized by a single exponent A 
of 2.5 as found in ref. (Cieplak & Hoang, 2001). This is 
illustrated in Figure 18 where the data corresponding to 
various structural classes are displayed together. They 
seem to be consistent with just one trend. 

Figure 19 shows Tmin and Tf for the case with Rc = 7.5A. 
It suggests that among the three models studied here, the 
one with the cut off in the contact range is the worst ki- 
netically because the gap between the band of values of 
Train and the band of values of Tf is the largest. This 
indicates that precise values of the contact range are im- 
portant in the task of putting pieces of a protein together 
in the folding process. Also in this model, there is no cor- 
relation with the relative contact order parameter. 

DISCUSSION 

We have studied 3 variants of the Go model through the 
molecular dynamics simulations and demonstrated the 
power law dependence of the folding time on N and lack 
of dependence on CO. Furthermore, the models with the 
variable contact range allow one to identify (at least) 
three kinetic universality classes corresponding to three 
different values of the exponent A. The lowest exponent 
found for the a- structures is consistent with the widely 
held belief that the a-helices are structures that are op- 
timal kinetically (Micheletti et al., 1999; Maritan et al., 
2000). The scaling behavior of Tmin and Tf, taken to- 
gether with the increasing tfoid suggests an asymptotic 
emergence of a glassy behavior. As a technical improve- 
ment, we have highlighted benefits of introducing the chi- 
rality potential. 

Recently, Koga and Takada (Koga & Takada, 2001) have 
also studied scaling of tf^id in proteins approximated by 
the Go model. They have considered the 10-12 potential 
that was augmented by potentials which involved the di- 
hedral angles (but no chirality). They have determined 
the folding temperature through the maximum in the 
specific heat. Their studies at T^, done for 18 proteins 
with N in the range between 53 and 153, suggest a tfou 
that exponentially depends on the relative contact order 
multiplied by N°-^. 

It is thus interesting to check on this conclusion in the 
framework of our approach. Figure 20 shows log{tfoid) 
vs. GOxN'^-^ for our best model, i.e. for the Lennard- 
Jones contact potential with variable contact range. It 
is clear that the data at Tmin (the left panels) show sig- 
nificantly less scatter than at Tf (the right panels) so 



the distinction between the power law and the exponen- 
tial function is certainly not due to considering different 
temperatures. Figure 20 does suggest a correlation with 
C0xA/'° <5 (the data plotted vs. 7V°-^ without the CO 
factor have a similar appearance indicating the irrele- 
vance of CO in such theoretical studies) and Koga and 
Takada quote a correlation level of 84% for their data. 
It is not very easy to distinguish between the power law 
and the exponential dependencies without a significant 
broadening of the range in the values of N. Figure 21 
shows the data of Figure 9 redisplayed on the log - linear 
scale. The exponential trends, tfoid ~ s.xp{N/S^), can- 
not be ruled out and the correlation levels are 75%, 94%, 
and 95% for the a /?, a, and [3 structural classes respec- 
tively whereas the corresponding values for the log-log 
plots are 81%, 97%, and 94%. Even though the power 
law fits appear better (or, in the case of the (3 proteins 
about the same) the important point is that the expo- 
nential fits also suggest existence of the three diff'erent 
kinetic universality classes since the characteristic values 
of the ^ parameter, as displayed in the Figure, are clearly 
distinct. Our trends displayed in Figure 9 seem much less 
scattered than those shown in Figure 20, especially in the 
right hand panels of Figure 20. However, while we argue 
in favour of the three universality classes and then the 
power laws, we see a need for further studies and better 
understanding of these issues. 

It has been found recently (Cieplak & Hoang. 2002c) that 
the kinetics of Go models are very sensitive to the selec- 
tion of what constitutes the proper set of the native con- 
tacts. For instance, if one declares a uniform cutoff range, 
Rc, between the C" atoms for making a contact, then the 
dependence of tfoU on Rc is strong and non-monotonic. 
Koga and Takada declare the contact as occurring if two 
non-hydrogen atoms in a pair of amino acids are in a dis- 
tance of less than either 5. 5 A or 6. 5 A (and it is stated 
that the results are stable with respect to this choice). 
Our definition of the contacts, on the other hand, in- 
volves the atomic sizes which yields a diff'erent contact 
map and leads to different folding times. 

The basic unsolved question is why do the folding times 
in various Go models do not depend on the contact or- 
der even though the primary ingredient of any Go model 
is the geometry of the native state of a protein. One 
technical problem with the contact order is that the very 
notion of a contact is fairly subjective. Consider, for in- 
stance, the G protein - the PDB code is Igbl for the 
structure determined by NMR and Ipga for the crystal- 
lographic structure. When we make use of the van der 
Waals radii then we get CO = 0.239 for Igbl and 0.250 
for Ipga. The alternative procedure is to consider two 
residues contacting if they contain non-hydrogen atoms 
within a distance of d. For d equal to 3, 4, 5, 6, 7 and 8 
1, our procedure yields CO of 0.194, 0.220, 0.235, 0.252, 



0.277, and 0.295 respectively (for the Ipga structure it is 

0. 257 if the cutoff of 6A is used - i.e. not very different). 
Plaxco et al. (Plaxco et al., 1998; Plaxco et al., 2000) 
used the value of d — 6A, and they quoted CO of 0.173 
for this case. The notable difference from our value arises 
from the fact that in their calculation (Plaxco - private 
communication), all of the contacts made by the atoms 
(i.e. up to dozens for a pair of amino acids) contribute to 
the value of CO if the corresponding distance does not ex- 
ceed d. Furthermore, the 'contacts' between consecutive 
residues (i.e. between i and i + are taken into account. 
In our calculation, the shortest local contacts are of the 

1, i + 2 type. Note that the values of CO vary with d 
quite substantially (on the scale of the figures involved) 
and the value obtained at c? = 6A is about 45% larger 
than that quoted by Plaxco ct al. The important point, 
however, is not that much what is the absolute value of 
CO but whether its correlation with the folding rate is 
sensitive to the choice of a specific definition of CO that is 
adopted. We have found that, quite remarkably, this cor- 
relation in the set of the experimentally studied proteins 
remains strong even when our procedure for the calcula- 
tion of CO is used. We find that even though the scatter 
away from the trend is noticeably larger than when using 
the COp - the values of CO quoted by Plaxco et al. - 
the correlations with CO remain robust and some depen- 
dence on CO develops in the case of the /3-proteins. It 
is hoped that further interactions and iterations between 
theory and experiment will make the issues of size and 
contact order dependence more definitive. The notion of 
universality classes in proteins should play an important 
role in this process. 
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FIGURE CAPTIONS 

Figure 1. Experimentally determined folding times based 
on tables compiled by Plaxco et al. (Plaxco et al., 2000). 
The solid circles, open hexagons, and stars are for the 
a~f3-, a-, and (3- proteins respectively. 

Figure 2. Experimentally determined folding times as 
split into three structural classes. The panels on the 
left hand side show the dependence on N whereas 
the panels on the right hand side show the depen- 
dence on the relative contact order parameter. Note 
that the time window in the middle panels are shifted 
by two orders of magnitude compared to the other 
panels. The top panels corresponds to the following 
a-/3-proteins: ldiv(56), lgbl(56), 2ptl(63), 2ci2(65), 
laye(71), lubq(76), lhdn(85), 2ula(88), laps(98), and 
2vik(126), where the number in brackets indicates the 
value of TV (in the case of 2ci2 there are 19 more amino 
acids but their structure is undetermined). The middle 
panels correspond to the following a-proteins: 2pdd(43), 
2abd(86), limq(86), llmb(92), lhrc(104), 256b(106), and 
lf63(154). The bottom panels correspond to the following 
/3-proteins: lefn(57), lcsp(67), lten(89), and ltit(89). 
The papers by Plaxco et al. (Plaxco et al., 1998; Plaxco 
et al., 2000) also contain data on several other proteins 
that are not shown here - we restrict ourselves only to 
the proteins that we study through simulations. (We had 
difficulties with the identification of the proper structure 
files for the remaining proteins). The subscript P in COp 
signifies the criterion of Plaxco et al. (Plaxco et al., 1998; 
Plaxco et al., 2000) for a formation of a contact: two 
residues are considered to be contacting if they contain 
non-hydrogen atoms within the distance of QA. The sym- 
bols are as described in the caption of Figure 1. 

Figure 3. The distribution of the effective contact lengths 
in T4 lysozyme as determined by the procedure which 
is based on the van der Waals radii of the atoms. The 
shaded region corresponds to the contacts that would not 
be included if the cutoff of l.f>A was adopted. 

Figure 4. The distribution of the chirality parameter C 
in 21 a-(}- proteins studied. 

Figure 5. Times to establish contacts of a given sequence 
separation, |i — j| for crambin and for the indicated val- 
ues of the damping constant 7. The times are rescaled so 
that k is equal to 1, 6, and 12 for 7 equal to 2, 12 and 24 
m/r respectively and shown top to bottom. The symbols 
corresponding to 7 = 12m, /t are reduced in size for clar- 
ity. The magnitude of the remaining symbols indicates 
the size of the error bars. The model used here corre- 
sponds to the Lennard-Jones contacts and the contacts 
are determined based on the van der Walls radii. The 
criterion for establishing a contact (for the first time) is 
based on whether the two beads come within a distance 



of l.ScTjj of each other. This figure illustrates existence of 

second order effects in the dependence on 7 because the 
rescaling by k brings the data points for a given event 
together but there is no strict overlapping. 

Figure 6. The dependence of the folding time on 
temperature for "synthetic" secondary structures of 16 
monomers. The top two panels are for the a-hclix system 
H16 and the bottom panel is for the /3-hairpin B16. The 
dotted lines correspond to the chirality potential (with 
K=l) included and the solid lines arc for the case when 
it is not. In all cases, Rc = 7.5A. The top panel corre- 
sponds to the contact based criterion whereas the other 
panels is for the criterion based on the cutoff RMSD of 
O.2I. 

Figure 7. The dependence of the folding time on temper- 
ature for various Go models of crambin. The top panel is 
for the contact cutoff range of 7.5A whereas the bottom 
panel is for the locally calculated contact ranges. On 
the top panel, the dotted line corresponds to the case 
with the chirality potential and the solid line - without. 
On the bottom panel, both curves include the chirality 
potential. Here, the solid (dashed) line is for the 6-12 
(10-12) contact potential. The arrows indicate values of 
the folding temperature Tj. The heavier (lighter) arrow 
is for the 6-12 (10-12) potential. 

Figure 8. The dependence of the folding time on temper- 
ature for models of the protein Itit. The symbols are as 
in Figure 6: the thin solid line and the triangular data 
points arc for Rc = 7. 5 A and no chirality; the dotted 
line with the square data points are for R^ = 7.5A and 
with the chirality; the thick solid line with the solid cir- 
cular data points are for the locally calculated Rc and 
the Lennard-Jones contact potential with the chirality; 
the dashed line with the open circular data points are for 
the similar case with the 10-12 potential. The arrows in- 
dicate the values of T/ for the contacts of variable range: 
thick for the Lennard-Jones case and thin for the 10-12 
case. 

Figure 9. The scaling of tfoid with N for the 51 pro- 
teins as modeled by the 6-12 contact potential with the 
variable contact range. The data are split into the a-(3-, 
a-, and /3-proteins as indicated. The lines indicate the 
power law behavior with the A exponent displayed in the 
right hand corner of each panel. The error bars in the 
exponent are of order ±0.2. The folding times are calcu- 
lated at Tmin- The correlation levels of the points shown 
are 81%, 97% and 94% for the top, middle and bottom 
panels respectively. 

Figure 10. This Figure replots the data points of Figure 
8 in one panel. For clarity, two of the most distant out- 
layers in each class are not shown. The solid, dotted, and 



broken lines correspond to the slopes of 3.2, 2.5, and 1.7 
respectively. The correlation level is 87%. 

Figure 11. Same as Figure 8 but the folding times are 

determined at Tj instead at T„i;„ . The data points rep- 
resented by the arrows indicate values which are signifi- 
cantly off the frame of the figure (for which only the lower 
bound of 30000 t is known). The correlation levels are 
83%, 88% and 77% for the top to bottom panels respec- 
tively. 

Figure 12. The role of the chirality potential on the fold- 
ing times for the a-proteins. The hexagons are the data 

points shown in the middle panel of Figure 8 whereas the 
crosses correspond to the results obtained for k=0. 

Figure 13. The values of T,„i„ and Tj shown vs. A'^ 
for the Lennard-Jones potential with the variable con- 
tact range. The data points are divided into the three 
structural classes. 

Figure 14. The dependence of tfoid, Tmim and Tf on the 
relative contact order parameter for the Lennard-Jones 
contact potential with the variable contact range. The 
data symbols indicate the structural classes and are iden- 
tical to those used in Figures 8, 9, 10, and 11. 

Figure 15. Same as in Figure 8 but for the 10-12 contact 
potential. The correlation levels are 88%, 98% and 91% 
from top to bottom. 

Figure 16. Same as in Figure 12 but for the 10-12 con- 
tact potential. 

Figure 17. Same as in Figure 8 but for the Lennard- 
Jones potential with Rc = 7.5A. The correlation levels 
are 83%, 91% and 93% for the top to bottom panels re- 
spectively. 

Figure 18. Same as in Figure 9 but for the cutoff of l.bA 
in the range of the contact potential. The solid line has 
a slope of 2.5. The correlation level for all of the points 
is 88%. 

Figure 19. Same as in Figure 12 but for Rc = 7.5A. 

Figure 20. Logarithm of the folding time vs. COxA^"-^ 
for the three structural classes. The data correspond 
to the Lennard-Jones potential with the variable range. 
The left hand panels are for T — Tmin and the right hand 
panels for T = Tf. Note that the horizontal scale in this 
figure is linear, not logarithmic as in most previous fig- 
ures. The arrows, like in Figure 11, indicate data points 
which are significantly off the scale of the frame of the 
figure. 



Figure 21. The data of figure 9 redisplayed on the log - 

linear plane. The dashed lines indicate fits to the expo- 
nential law tfoid ~ s.xp{h/^) with the values of ^ shown 
in the right hand corner of each panel. The correlation 
levels are 75%, 94% and 95% for the top to the bottom 
panels respectively. The overall correlation level is 82% 
whereas for the power law fit it is 86%. The correspond- 
ing numbers for the 10 T2 potential and the Lennard- 
Jones with the cutoff of 7.5l are 87%, 89% and 81%, 
88%. The fitted values of ^ for the 10-12 potential are 
about the same as for the Lennard-Jones case. 

Abbreviations used: PDB, Protein Data Bank; NMR, 
nuclear magnetic resonance. 
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